Answer the following questions:
In this activity, you will:
O’Sullivan D and Unwin D (2010) Geographic Information Analysis, 2nd Edition, Chapters 8 and 9. John Wiley & Sons: New Jersey.
For this activity you will need the following:
This R markdown notebook.
A file called “Wolfcamp Aquifer.RData”
The data is a set of piezometric head (watertable pressure) observations of the Wolfcamp Aquifer in Texas (https://en.wikipedia.org/wiki/Hydraulic_head). Measures of pressure can be used to infer the flow of underground water, since water flows from high to low pressure areas.
These data were collected to evaluate potential flow of contamination related to a high level toxic waste repository in Texas. The Deaf Smith county site in Texas was one of three potential sites proposed for this repository. Beneath the site is a deep brine aquifer known as the Wolfcamp aquifer that may serve as a conduit of contamination leaking from the repository.
The data set consists of 85 georeferenced measurements of piezometric head. Possible applications of interpolation are to determine sites at risk and to quantify uncertainty of the interpolated surface, to evaluate the best locations for monitoring stations.
It is good practice to clear the working space to make sure that you do not have extraneous items there when you begin your work. The command in R to clear the workspace is rm (for “remove”), followed by a list of items to be removed. To clear the workspace from all objects, do the following:
rm(list = ls())
Note that ls() lists all objects currently on the worspace.
Load the libraries you will use in this activity (load other packages as appropriate).
library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.2.1 --
## v ggplot2 3.1.0 v purrr 0.2.5
## v tibble 2.0.1 v dplyr 0.7.8
## v tidyr 0.8.2 v stringr 1.3.1
## v readr 1.3.1 v forcats 0.3.0
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(plotly)
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
library(spdep)
## Loading required package: sp
## Loading required package: Matrix
##
## Attaching package: 'Matrix'
## The following object is masked from 'package:tidyr':
##
## expand
## Loading required package: spData
## To access larger datasets in this package, install the spDataLarge
## package with: `install.packages('spDataLarge',
## repos='https://nowosad.github.io/drat/', type='source')`
library(geog4ga3)
Load dataset:
data("aquifer")
Add polynomial terms:
aquifer <- mutate(aquifer, X3 = X^3, X2Y = X^2 * Y, X2 = X^2,
XY = X * Y,
Y2 = Y^2, XY2 = X * Y^2, Y3 = Y^3)
Linear:
trend1 <- lm(formula = H ~ X + Y, data = aquifer)
summary(trend1)
##
## Call:
## lm(formula = H ~ X + Y, data = aquifer)
##
## Residuals:
## Min 1Q Median 3Q Max
## -367.00 -161.41 -30.74 148.23 651.17
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2591.3266 38.9636 66.51 <2e-16 ***
## X -6.7519 0.3439 -19.64 <2e-16 ***
## Y -5.9862 0.4066 -14.72 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 203.3 on 82 degrees of freedom
## Multiple R-squared: 0.892, Adjusted R-squared: 0.8894
## F-statistic: 338.8 on 2 and 82 DF, p-value: < 2.2e-16
Quadratic:
trend2 <- lm(formula = H ~ X^2 + X + XY + Y + Y^2, data = aquifer)
summary(trend2)
##
## Call:
## lm(formula = H ~ X^2 + X + XY + Y + Y^2, data = aquifer)
##
## Residuals:
## Min 1Q Median 3Q Max
## -406.31 -138.87 -13.05 129.29 722.49
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.627e+03 3.833e+01 68.537 < 2e-16 ***
## X -8.288e+00 5.659e-01 -14.646 < 2e-16 ***
## XY 2.453e-02 7.402e-03 3.314 0.00137 **
## Y -6.648e+00 4.327e-01 -15.363 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 192 on 81 degrees of freedom
## Multiple R-squared: 0.9049, Adjusted R-squared: 0.9014
## F-statistic: 257 on 3 and 81 DF, p-value: < 2.2e-16
Cubic:
trend3 <- lm(formula = H ~ X^3 + X2Y + X2 + X + XY + Y + Y2 + XY2 + Y3, data = aquifer)
summary(trend3)
##
## Call:
## lm(formula = H ~ X^3 + X2Y + X2 + X + XY + Y + Y2 + XY2 + Y3,
## data = aquifer)
##
## Residuals:
## Min 1Q Median 3Q Max
## -394.66 -132.51 -1.24 97.80 497.00
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.477e+03 1.120e+02 22.115 < 2e-16 ***
## X -8.432e+00 9.381e-01 -8.989 1.4e-13 ***
## X2Y 3.870e-04 1.540e-04 2.513 0.0141 *
## X2 -1.868e-02 9.065e-03 -2.061 0.0427 *
## XY 3.448e-02 2.766e-02 1.247 0.2164
## Y 1.460e+00 4.862e+00 0.300 0.7647
## Y2 -9.601e-02 5.791e-02 -1.658 0.1015
## XY2 -1.162e-04 1.614e-04 -0.720 0.4738
## Y3 2.905e-04 2.012e-04 1.444 0.1529
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 179.6 on 76 degrees of freedom
## Multiple R-squared: 0.9219, Adjusted R-squared: 0.9137
## F-statistic: 112.1 on 8 and 76 DF, p-value: < 2.2e-16
Based on this, the most likely choice is the cubic trend model.
predict to interpolate the field using your chosen model. Plot the interpolated field using a method of your choice (e.g., ggplot2, 3D plot, etc.)Find the domain of the coordinates:
summary(aquifer[,1:2])
## X Y
## Min. :-145.24 Min. : 9.414
## 1st Qu.: -21.30 1st Qu.: 33.682
## Median : 11.66 Median : 59.158
## Mean : 16.89 Mean : 79.356
## 3rd Qu.: 70.90 3rd Qu.:131.825
## Max. : 112.80 Max. :184.766
Coordinates for prediction:
X.p <- seq(-140, to = 110, by = 5)
Y.p <- seq(10, to = 180, by = 5)
Use these to create interpolation grid:
df.p <- expand.grid(X = X.p, Y = Y.p)
Calculate polynomial terms:
df.p <- mutate(df.p, X3 = X^3, X2Y = X^2 * Y, X2 = X^2,
XY = X * Y,
Y2 = Y^2, XY2 = X * Y^2, Y3 = Y^3)
Obtain predictions:
preds <- predict(trend3, newdata = df.p, se.fit = TRUE, interval = "prediction", level = 0.95)
Append the predictions, intervals, standard errors to the prediction dataframe:
df.p <- data.frame(df.p, z.p = preds$fit[,1], lwr = preds$fit[,2], upr = preds$fit[,3], se.fit = preds$se.fit)
Plot the trend surface:
ggplot(data = aquifer, aes(x = X, y = Y)) +
geom_tile(data = df.p, aes(x = X, y= Y, fill = z.p)) +
scale_fill_distiller(palette = "YlOrRd", trans = "reverse") +
geom_point(aes(color = H)) +
scale_color_distiller(palette = "YlOrRd", trans = "reverse") +
geom_point(shape = 1, size = 2) +
coord_equal()
As an alternative:
Convert predictions to matrix for plotting in plotly:
z.p <- matrix(preds$fit[,1], nrow = 35, ncol = 51, byrow = TRUE)
Plot:
plot_ly(x = ~X.p, y = ~Y.p, z = ~z.p, type = "surface") %>%
add_markers(data = aquifer, x = ~X, y = ~Y, z = ~H)
predict).The simplest way to do this is by looking at the descriptive statistics of the fit:
summary(preds$fit)
## fit lwr upr
## Min. :1147 Min. : 757.9 Min. :1536
## 1st Qu.:1644 1st Qu.:1268.4 1st Qu.:2021
## Median :2102 Median :1703.6 Median :2488
## Mean :2164 Mean :1751.9 Mean :2577
## 3rd Qu.:2676 3rd Qu.:2212.6 3rd Qu.:3102
## Max. :3354 Max. :2946.3 Max. :4388
The range can be plotted as follows. First append the fit and confidence interval to dataframe:
df.p <- data.frame(df.p, z.p = preds$fit[,1], lwr = preds$fit[,2], upr = preds$fit[,3])
Then plot:
ggplot(data = df.p, aes(x = X, y = Y, fill = lwr)) +
geom_tile() +
scale_fill_distiller(palette = "YlOrRd", trans = "reverse") +
coord_equal()
ggplot(data = df.p, aes(x = X, y = Y, fill = upr)) +
geom_tile() +
scale_fill_distiller(palette = "YlOrRd", trans = "reverse") +
coord_equal()
Or in plotly:
z.p_l <- matrix(data = preds$fit[,2], nrow = 35, ncol = 51, byrow = TRUE)
z.p_u <- matrix(data = preds$fit[,3], nrow = 35, ncol = 51, byrow = TRUE)
Plot:
plot_ly(x = ~X.p, y = ~Y.p, z = ~z.p, type = "surface", colors = "YlOrRd") %>%
add_surface(x = ~X.p, y = ~Y.p, z = ~z.p_l, opacity = 0.5, showscale = FALSE) %>%
add_surface(x = ~X.p, y = ~Y.p, z = ~z.p_u, opacity = 0.5, showscale = FALSE) %>%
add_markers(data = aquifer, x = ~X, y = ~Y, z = ~H)
Label the residuals:
aquifer$residuals <- ifelse(trend3$residuals > 0, "positive", "negative")
Plot:
ggplot(data = aquifer, aes(x = X, y = Y, color = residuals)) +
geom_point() +
coord_equal()
The plot suggests pockets of positive and negative residuals. Conduct test with Moran’s I.
First create spatial weights:
w <- knearneigh(as.matrix(aquifer[,1:2]), k = 5) %>% knn2nb() %>% nb2listw()
Moran’s test:
moran.test(trend3$residuals, w)
##
## Moran I test under randomisation
##
## data: trend3$residuals
## weights: w
##
## Moran I statistic standard deviate = 2.5826, p-value = 0.004902
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic Expectation Variance
## 0.145630475 -0.011904762 0.003720749
The test fails to reject the null hypothesis, which means that there is residual pattern. If you were asked to guess the value of the residual at location [-25,100], would you say it was most likely to be positive or negative? How can we use this information to improve the quality of our predictions?